Goto

Collaborating Authors

 pixel 3


Benchmarking Mobile Device Control Agents across Diverse Configurations

Lee, Juyong, Min, Taywon, An, Minyong, Kim, Changyeon, Lee, Kimin

arXiv.org Artificial Intelligence

Developing autonomous agents for mobile devices can significantly enhance user interactions by offering increased efficiency and accessibility. However, despite the growing interest in mobile device control agents, the absence of a commonly adopted benchmark makes it challenging to quantify scientific progress in this area. In this work, we introduce B-MoCA: a novel benchmark designed specifically for evaluating mobile device control agents. To create a realistic benchmark, we develop B-MoCA based on the Android operating system and define 60 common daily tasks. Importantly, we incorporate a randomization feature that changes various aspects of mobile devices, including user interface layouts and language settings, to assess generalization performance. We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs as well as agents trained from scratch using human expert demonstrations. While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to enhance their effectiveness. Our source code is publicly available at https://b-moca.github.io.


ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement

Li, Chaojian, Chen, Wenwan, Yuan, Jiayi, Lin, Yingyan, Sabharwal, Ashutosh

arXiv.org Artificial Intelligence

Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW x 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.


From Patterson Maps to Atomic Coordinates: Training a Deep Neural Network to Solve the Phase Problem for a Simplified Case

Hurwitz, David

arXiv.org Machine Learning

This work demonstrates that, for a simple case of 10 randomly positioned atoms, a neural network can be trained to infer atomic coordinates from Patterson maps. The network was trained entirely on synthetic data. For the training set, the network outputs were 3D maps of randomly positioned atoms. From each output map, a Patterson map was generated and used as input to the network. The network generalized to cases not in the test set, inferring atom positions from Patterson maps. A key finding in this work is that the Patterson maps presented to the network input during training must uniquely describe the atomic coordinates they are paired with on the network output or the network will not train and it will not generalize. The network cannot train on conflicting data. Avoiding conflicts is handled in 3 ways: 1. Patterson maps are invariant to translation. To remove this degree of freedom, output maps are centered on the average of their atom positions. 2. Patterson maps are invariant to centrosymmetric inversion. This conflict is removed by presenting the network output with both the atoms used to make the Patterson Map and their centrosymmetry-related counterparts simultaneously. 3. The Patterson map does not uniquely describe a set of coordinates because the origin for each vector in the Patterson map is ambiguous. By adding empty space around the atoms in the output map, this ambiguity is removed. Forcing output atoms to be closer than half the output box edge dimension means the origin of each peak in the Patterson map must be the origin to which it is closest.


Pixel 4 gets automatic robocall screening, improved location accuracy, and more

#artificialintelligence

If Google's Pixel 4 is your daily driver, good news: It's now able to screen robocalls -- and more. Google announced this morning an update to the Pixel 4's Call Screen feature in the U.S. that automatically declines calls from unknown parties and filters out suspected robocallers, alongside an improved video calling experience on Duo, the rollout of the new Google Assistant to more users, and a zippier software experience made possible by memory usage optimizations. It's a part of what Google's calling feature drops, which will deliver "bigger updates" to Pixel devices with "more helpful and fun features" going forward. The first arrives starting today, with others to follow on a monthly cadence. "Pixel phones have always received monthly updates to improve performance and make your device safe," wrote Google group product manager Shenaz Zack in a blog post.


Google will bring its AI-powered Recorder app to older Pixel devices

#artificialintelligence

When Google unveiled its voice recording app aptly called Recorder at its Pixel event last month, we were impressed by its capabilities. Unlike standard run-of-the-mill recording apps that simply make and store audio files, Google's version uses its AI smarts to perform real-time voice transcription into text, and it can locate music and specific words inside the audio file -- all without a cloud connection. Initially announced as a Pixel 4-exclusive, a Google employee has confirmed on Reddit that Recorder will be made backward-compatible with older Pixel devices. It all started with member Valendr0s voicing his frustration on Reddit that the Recorder app wasn't compatible with his Pixel 3 XL and his wife's Pixel 2 phones, stating how useful the app would be during visits to his wife's doctor. A day later, a Googler with the handle PixelCommunity responded back on Reddit, affirming Recorder's upcoming compatibility with older Pixel phones.


Pixel 4 review: Google's latest smartphone is very good but not great

USATODAY - Tech Top Stories

Google has never widely been considered the top banana when it comes to smartphones, a designation bestowed instead on Samsung or Apple, depending on whether your loyalties lie with Android or iOS. But the last couple of years, Google's Pixels have presented an awfully strong case: solid Android phones with superb cameras that you can usually get for less than you pay for a top Galaxy or iPhone. So it goes with the Pixel 4 I've been using over the several days. It has a 5.7-inch display and starts at $799 (or $899 for its larger 6.3-inch sibling, the Pixel 4 XL) and for the first time is being embraced by all the U.S. wireless carriers out of the gate; in the past years, Verizon had the exclusive. As with other Pixels, the obedient Google Assistant is readily at hand, summoned through a familiar "Hey, Google" or "OK, Google" command, tapping an icon, and now even by squeezing the sides of the phone.


Pixel 4 and Pixel 4 XL review: Function over form

#artificialintelligence

Annually since 2016, Google has released a pair of flagship Pixel smartphones designed to showcase the very best of Android. This year was like any other with the debut of the Pixel 4 and Pixel 4 XL, which ship running Android 10. But what's unusual this time around is that the newest duo's hardware is perhaps just as compelling as their software. Gone is the two-tone rear cover that featured prominently on the original Pixel, Pixel 2, and Pixel 3 series, replaced with polished and grippy Corning Gorilla Glass 5. It's easier to grasp ahold of than that of the Pixel 3 and Pixel 3 XL, and it's more resistant to oily fingers and pocket lint. The Pixel 4 series is IP68 certified to withstand up to five feet of water for half an hour, which puts it on par with the outgoing Pixel 3 series. But both the Pixel 4 and the Pixel 4 XL are a good deal heavier than the Pixel 3 (5.71 The Pixel 4 series' frame is coated with a soft-touch material that's jet black on all three of the colorways -- Clearly White, Just Black, and the limited edition Oh So Orange. The haptics, which Google characterizes as "sharp and textured," feel great.


ProBeat: Google's Pixel 4 ups the AI ante to offline language models

#artificialintelligence

Google's Pixel phones are the company's preferred way of showcasing its AI chops to consumers. Pixel phones consistently set the phone camera bar thanks to Google's AI prowess. But many of the AI features have nothing to do with the camera. The Pixel 4 and Pixel 4 XL unveiled this week at the Made by Google hardware event in New York City continue this tradition. Camera improvements aside, the Pixel 4 makes a play for a new arena that Google clearly wants to rule: offline natural language processing.


AI is Transforming Mobile Technology

#artificialintelligence

AI improves both hardware and software on mobile phones, with implications for marketers and consumers. Mobile AI helps you automate some of your 35,000 decisions per day The average person sees their phones at 2.5 hours and makes 35,000 decisions in a single day. The reality is that we often use one another to help. Our smartphones help us with the choices we make. Even now, Artificial Intelligence (AI) is a large part of how you use your smartphone.


Google's Pixel 4 will come with motion-sensing tech so you can control phone without touching it

Daily Mail - Science & tech

Google has confirmed one of the biggest rumors surrounding the upcoming Pixel 4, revealing a new technology that allows users to operate the device using only hand gestures. In a blog post, Google confirmed that its newest Pixel iteration will include what it describes as'motion-sense' technology -- a short-range radar that enables users to control their phones from a distance. The Pixel 4 will be the first of Google's devices using the technology, called Soli, and according to the company will allow customers to'skip songs, snooze alarms, and silence phone calls, just by waving your hand.' 'These capabilities are just the start, and just as Pixels get better over time, Motion Sense will evolve as well,' wrote Google in a post. One of the most intriguing applications of Soli will combine the technology with facial recognition software -- another new addition to the device -- to make the'unlocking' process more seamless. Google says the Pixel 4 will use its radar technology to sense when you're about to pick the device up, enabling the phone to preemptively activate its facial recognition feature and theoretically expedite the unlock process.